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Executive Summary 


This report is part of a series of studies on teacher evaluation in Chicago. 
Previous reports focused on implementation and on teacher and administra- 


tor perceptions of Chicago’s new evaluation system. This report addresses 
differences in teacher observation and value-added scores across schools 


and how those scores are related to school and teacher characteristics. 


In the fall of 2012, Chicago Public Schools (CPS) 
instituted a sweeping reform of its teacher evaluation 
system with the introduction of Recognizing Educators 
Advancing Chicago’s Students (REACH). REACH re- 
places CPS’s former 1970s-era evaluation system. This 
previous checklist system fell short in providing teach- 
ers meaningful feedback for improving their instruc- 
tional practice as well as identifying which teachers 
excelled or which needed improvement. Over the course 
of only a few years, most districts across the nation have 
moved from similar checklist evaluation systems to new 
evaluation systems utilizing an evidenced-based obser- 
vation rubric, a formal feedback process and measures 
of student growth in teachers’ observations. These data 
provide additional information on individual teacher’s 
practice as well as better information about the quality 
of a district’s overall teaching workforce. 

This report is part of a series of Consortium stud- 
ies on teacher evaluation in Chicago. Previous reports 
focused on implementation and on teacher and admin- 
istrator perceptions of REACH.' These reports found 
REACH has provided more differentiation among 
teachers and that most teachers and administrators had 
positive opinions about the new system, especially the 


observation process. Overwhelming majorities of teach- 


1 Sporte, Stevens, Healey, Jiang, & Hart (2013); Jiang, Sporte, & Lup- 
pescu (2013); Jiang & Sporte (2014); Jiang, Sporte, & Luppescu (2015). 


ers and administrations believe the observation process 
supports teacher growth, identifies areas of strength 
and weaknesses, and provides opportunities for reflec- 
tion. Teachers remain apprehensive, however, on the 
inclusion of student growth metrics in their evaluations. 

REACH and other new evaluation systems have 
provided new data and information about the qual- 
ity and distribution of the overall teaching workforce. 
With this new information it is now possible to gauge 
the degree to which different groups of students have 
equitable access to teachers with high scores using 
actual metrics of teaching practice rather than prox- 
ies for teaching quality such as scores on certification 
tests, experience, or degrees earned. 

This report addresses differences in teacher ob- 
servation and value-added scores across schools and 
how those scores are related to school and teacher 
characteristics. It provides a descriptive analysis of 
the relationship between teachers’ evaluation scores 
and both school and individual teacher characteristics. 
It examines multiple sources of data from 2013-14, 
REACH’s second year of implementation, including 
teacher observation scores and value-added scores, stu- 
dent demographic and test score data, survey responses, 


and teacher personnel data. These data represent the 
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first fully comprehensive snapshot of evaluation scores 


for teachers in Chicago. For the first time, we have 


measures from observations for most of CPS’s 19,000 


teachers and individual value-added measures for those 


teachers teaching tested subjects and grades. By com- 


paring the patterns of school and teacher characteristics 


on both observation and value-added scores this report 


begins to surface questions about the instruction expe- 


rienced by Chicago’s students, and about the degree to 


which individual measures of teacher effectiveness may 


be sensitive to school context or other external factors. 


Key Findings 


Schools serving the most disadvantaged students 
have an overrepresentation of teachers with the 
lowest value added and observation scores. On both 
value-added and observation scores, teachers with the 
lowest scores are overrepresented in schools with the 
highest concentration of low-income students. This 
overrepresentation persists even after controlling for 
differences in teacher characteristics in high- and 


low-poverty schools such as experience levels. 


Observation scores have a stronger relationship 
with school characteristics, such as the percent- 
age of economically disadvantaged students, than 
value-added scores. On observation scores, teachers 
in lower poverty schools have substantially higher 
scores on average than teachers in higher poverty 
schools. Differences on value added are much small- 
er as value-added measures explicitly control for 
student characteristics such as poverty and previous 
achievement. It is not clear whether this relationship 
exists because observations are reflecting differenc- 
es in instruction in low and high poverty schools, or 
if itis harder to be effective or receive high observa- 


tion ratings in a high-poverty school. 


Teachers in schools with stronger organizational 
climates have higher evaluation scores.” Teachers 
in schools with better organizational and learning 


climates tend to have higher value-added and observa- 


As measured by the 5Essentials and Supplemental Survey 
Measures from the CPS 2014 My Voice, My Schoo! surveys. 
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tion scores. These differences remain significant when 


comparing schools with similar student populations. 


¢ There are some differences in teachers’ evaluation 
scores, depending on experience and credentials. 
Teachers with more experience have higher scores 
on value added and observations than new teachers. 
Differences between teachers with National Board 
Certification or advanced degrees, compared to 
those without those credentials, were found only 


on observation scores, not value added. 


e Minority teachers have lower observation scores 
than white teachers, but no significant differences 
on value added. On average, minority teachers’ 
observation scores were lower than white teachers’ 
observation scores. However, a large proportion of 
this difference is due to the substantial relation- 
ship between observation scores and school charac- 
teristics, such as school-level poverty, as minority 
teachers are overrepresented in the highest-poverty 
schools and underrepresented in the lowest-poverty 
schools. There were no significant differences by 
teacher race/ethnicity on either reading or math 


value-added scores. 


e Male teachers have lower observation and value- 
added scores than female teachers. On average, 
male teachers scored lower than female teachers on 
observations and slightly lower on value added than 


their female counterparts within the same schools. 


While this report describes the relationships between 
teacher evaluation scores, their characteristics, and the 
context in which they teach, it does not explain why these 
relationships exist. Districts and states across the nation 
have found similar patterns in the relationship of teacher 
evaluation ratings with school poverty. There are many 
potential explanations for these patterns, and it is too 
soon to come to any conclusions about their source. 
Further research is needed to understand the degree to 
which observation scores, in particular, may be reflecting 
true differences in instructional practice or reflecting con- 


textual factors, such as classroom or school composition. 


Introduction 


In the fall of 2012, Chicago Public Schools (CPS) instituted a reform of 
its teacher evaluation system with the introduction of REACH Students. 
This reform was part of an overall movement nationwide to improve 
how educators are assessed. These new systems have generated a 


wealth of data. This report uses CPS teacher evaluation data and begins 
to explore questions about students’ access to effective teaching and 
the degree to which evaluation scores may be sensitive to student and 


school characteristics. 


Decades of research evidence have consistently found 
that teachers are the most important in-school factor 
related to student learning and achievement. Students 
taught by an effective teacher have better academic 
outcomes, greater chances of post-secondary success, 
and higher lifetime earnings.? At the same time, there 
are many unresolved questions about how to measure 
effective teaching, how to develop effective teachers, 
and how to ensure that all students have access to high- 
ly effective teaching. These issues continue to be among 
the most persistent challenges facing local, state, and 
federal education policymakers. 

Policymakers at the federal level have attempted to 
both improve teacher quality and address inequities 
in students’ access to effective teachers. In 2001, the 
federal government attempted to address inequities in 
students’ access to effective teachers by monitoring their 
qualifications. The No Child Left Behind (NCLB) act of 
2001 required states to ensure all teachers met minimum 
certification standards.* Beginning in 2009, the federal 
Race to the Top funding incentivized states to overhaul 
their teacher evaluation systems, with the premise that 
better evaluation policies and systems could be a pri- 


mary vehicle for improving teaching and learning across 


3 Aaronson, Barrow, & Sander (2007); Chetty, Friedman, & 
Rockoff (2013); Goldhaber (2002); Rockoff (2004); Rivkin, 
Hanushek, & Kain (2005). 

4 U.S. Department of Education (2001). 


schools.® That federal focus continues as states have 
been given opportunities to pursue waivers, exempting 
them from some components of NCLB. In exchange for 
flexibility in other areas, they are required to describe 
their evaluation systems and commit to timelines for 


evaluation system design and implementation. 


Within the Last Decade, Teacher 
Evaluation Has Taken Center Stage 
in Policy Reform Efforts 

In response to federal policies and incentives, the 
landscape for teacher evaluation policies and practices 
has undergone rapid and dramatic change. As of 2013, 
thirty-five states now require the inclusion of student 
achievement measures in their teacher evaluations and 
more than half of states require annual evaluations of 
allteachers.® Although states and districts vary consid- 
erably in the measures they use and the weights those 
measures are given, most now include observations of 
teacher practice and measures of student growth as 
part of acombined final evaluation score. And, although 
states and districts are taking different approaches 

to implementing these teacher evaluation systems, 


they commonly aim to improve teacher effectiveness 


5 U.S. Department of Education (2009). 
6 Steinberg & Donaldson (2014); National Council on Teacher 
Quality (2013). 
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through two key levers: 1) developing teachers’ instruc- 
tional skills through focused feedback, professional 
development, and incentives for improvement and 2) 
holding teachers accountable by incorporating evalua- 
tion measures into personnel decisions such as tenure 
and dismissal. 

These new evaluation systems have brought both 
opportunities and challenges. For example, teachers 
and principals believe the new evaluation process is 
leading to changes in teacher practice and improved 
communication and collaboration.” In Chicago, over- 
whelming majorities of administrators and two-thirds 
of teachers agree that the observation process will 
lead to better instruction.® There is also limited, but 
promising, evidence that the use of evidence-based 
observations improves instructional practice, as well 
as student learning.® At the same time, districts and 
states have described challenges related to developing 
and using evaluation measures and building capacity 
and sustainability.'° Teachers remain apprehensive 
about the inclusion of student growth metrics in their 
evaluations, and both teachers and administrators 
report increased levels of stress related to evaluation.” 
Despite these tensions, many districts and states have 
embraced teacher evaluation as a component of their 


improvement strategy. 


Data From New Evaluation Systems 


Allows Us to Explore Questions About 


Equity for Teachers and Students 


Over the course of only a few years, most districts 
have moved from an annual checklist conveying little 
information on teacher performance to detailed re- 
ports including evaluation data from multiple class- 
room observations and measures of student growth.” 
These data not only provide additional information on 
individual teachers’ practice, but also provide better 


information about the quality of the district’s overall 


7 Jiang & Sporte (2014); Murphy, Cole, Pike, Ansaldo, & Robinson 


(2014); Sporte et al. (2013). 

8 Jiang & Sporte (2014). 

9 Steinberg & Sartain (2015); Taylor & Tyler (2013); Grissom, 
Loeb, & Master (2013). 

10 Government Accountability Office (2013). 

11 Jiang et al. (2014); Sporte et al. (2013). 


Introduction 


Key Elements of REACH 


CPS implemented a new teacher evaluation sys- 
tem—REACH-—in fall 2012. REACH incorporates a 
professional practice score and up to two measures 
of student growth in a teacher’s evaluation score. 


Professional Practice is evaluated through four 
observations using the CPS Framework for Teaching, 
a modified version of the Charlotte Danielson 
Framework for Teaching. 


Student Growth Measures 

Value Added: Teachers in tested subjects and 
grades receive an individual value-added score. 
Most teachers in non-tested subjects and grades 
receive a school-wide average value-added score 
in literacy. 


Performance Tasks: Performance tasks are written 
or hands-on assessments designed to measure stu- 
dents’ progress toward mastery of a particular skill 
or standard. There are different performance tasks 
for each subject and grade. Performance tasks 

are typically administered and scored by teachers 
themselves. 


For more details on REACH measures see, Appendix A. 


teaching workforce. Examining the distribution of 
teachers’ scores and ratings may give insight into how 
teachers are deployed across a district and the nature of 
instruction received by students. 

Past research has repeatedly shown that economi- 
cally disadvantaged and minority students are more 
likely to be taught by inexperienced teachers with lower 
qualifications, as well as those who have lower value- 
added scores. New evaluation systems provide the 
opportunity to assess teaching, using a broader range 
of information on instructional quality across the 
district. It is now possible to gauge the degree to which 
different groups of students have equitable access to 


high-scoring teachers, using actual metrics of teaching 


12 Many also use student perception measures, such as student 
surveys, and other measures of teacher practice, such as 
student learning objectives. 

13 Lankford, Loeb, & Wyckoff (2002); Clotfelter, Ladd, & Vigdor 
(2006); Goldhaber, Lavery, & Theobald (2015); DeAngelis, 
Presley, & White (2005). 


practice, including observations of classroom teaching 
rather than proxies for teaching quality derived from 
teacher qualifications or certification test scores. It 

is also now possible to compare and contrast patterns 
from multiple measures of teacher performance such as 
observations and value-added scores. 

These data also allow us to investigate the relation- 
ship between teacher characteristics and their evaluation 
scores. In Chicago and elsewhere, the overall teaching 
workforce has become whiter and less experienced. 
This is of concern, as research has found that minority 
teachers tend to have higher expectations for minor- 
ity students and have positive effects on their academic 
achievement.’ If groups of teachers—novice teachers, 
teachers with advanced certification, or teachers of one 
race or gender—have systematically higher or lower 
evaluation scores than their peers, then school and 
district administrators may need to pay special atten- 
tion to how these groups of teachers are supported. We 
can also determine whether markers of teacher quality 
that are used in the hiring process, including experience 
and National Board Certification, are related to teachers’ 
effectiveness ratings. This information could be useful to 
administrators deciding how strongly to consider these 
types of qualifications during the hiring process. 

While the new measures are likely to improve the 
identification of schools with more effective teaching 
practices, there are still questions about whether they 
can fairly assess individual teacher quality. That is, 
it is not clear whether all teachers have an equal opportu- 
nity to receive strong ratings. There is emerging research 
on teacher evaluation systems that finds teachers of stu- 
dents with higher prior achievement systematically re- 
ceive higher observation scores, thus creating an uneven 
playing field for educators teaching students with lower 
achievement.'® A system that makes it more difficult 
for teachers working in challenging contexts to achieve 


high ratings could have adverse consequences for equity 


14 Moore (2015, July 14); Albert Shanker Institute (2015). 

15 Dee (2005); Ferguson (2003); Egalite, Kisida, & Winters (2015). 

16 Whitehurst, Chingos, & Lindquist (2014); Steinberg & Garrett 
(forthcoming). 
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for both students and teachers. It may lead to effec- 
tive teachers avoiding teaching in schools where strong 
teaching practices are most needed, and it may penalize 
those teachers who do decide to work in those contexts. 

By using multiple metrics of teaching effectiveness, 
we can compare teachers’ ratings on value added 
scores—which compare gains for students with similar 
backgrounds and prior achievement—to their ratings on 
observation metrics—which do not have adjustments for 
the types of students in a teacher’s classroom. This could 
particularly be an issue if certain kinds of teachers are 
more likely to teach in contexts where it is difficult to get 
a strong rating. This is particularly a concern for groups 
that are underrepresented in the teacher workforce, 
including minority teachers and male teachers. 

This report presents a descriptive analysis of the 


relationships of teacher ratings with the context in 


O1 


which they teach, but it does not discern the reasons 
why the relationships exist. Understanding the extent 
to which there are systematic differences in teacher 
ratings across different types of schools is only the first 
step for addressing possible inequities for students and 
teachers. Once we understand the extent of systematic 
differences, then we can examine the reasons they exist 
and potential ways to address them in future studies. 

This report shows the distribution of the observation 
and value-added scores across schools, and how they 
are related to school and teacher characteristics. It 


addresses the following questions: 


1. To what extent are measures of teacher effective- 
ness (value-added and observation scores) related 


to the characteristics of students in schools? 


e Are measures of effectiveness related to the level of 


student poverty in schools? 


¢ Arethese relationships different for value-added and 


observation scores? 


Teacher Evaluation in Chicago 


2. To what extent are measures of teacher effective- 
ness related to individual teacher characteristics? 
Are these relationships different for value-added 


and observation scores? 


¢ Do teachers with more years of experience have 
higher value-added and/or observation scores? 
What about teachers with advanced degrees or 
National Board Certification? 


2 


e Arethere differences in REACH scores by teachers 


race or gender? 


¢ To what extent are any differences explained by 


differences in school characteristics? 


This Study Extends Prior Research 
on Effective Teaching 

The data set examined in this study represents the first 
fully comprehensive snapshot of observation scores 

for teachers in Chicago. For the first time, we have 
measures from observations for most of CPS’s 19,000 
teachers and individual value-added measures for those 
teachers teaching tested subjects in grades 3-8. Thus 
we can begin to examine to what extent Chicago’s pat- 
terns of teacher quality align with or differ from what 


research has found elsewhere. 


Introduction 


This study builds the knowledge base of what is 
known about measures of teaching effectiveness ina 
number of ways. Most of the prior research has only 
examined relationships between school characteris- 
tics and teacher effectiveness, as measured by teacher 
qualifications (e.g., education level or experience) or 
value-added scores; this study explores patterns in 
observation scores, as well as value-added scores and 
teacher qualification metrics. Observation scores are 
available for a much broader range of teachers than 
value-added scores, which are only calculated for 
teachers of particular subjects in particular grades. 
They also directly capture differences in students’ 
experiences in the classroom, rather than indirectly 
capturing the effects of the classroom on students’ 
test scores. Yet, because observation scores are not 
adjusted for differences in students’ characteristics, 
they may also be influenced by the characteristics 
of schools and classrooms more so than the value- 
added metrics. Comparing the patterns of school and 
teacher characteristics on two measures, observation 
and value-added scores, allows us to capture different 
dimensions of these patterns and may also begin to 
surface questions about whether either measure may 


be sensitive to school context or other external factors. 


Data Used for This Study 


REACH evaluation scores are a combination of profes- 
sional practice—measured by observation ratings— 
and student growth. Ratings from four observations, 
conducted by principals or assistant principals who 
have been trained and certified on the rubric, make 
up the professional practice score. Observation rat- 
ings are based on the CPS Framework for Teaching, 

a modified version of the Danielson Framework for 
Teaching. Ratings include 19 components, based on 
four domains of practice; Planning and Preparation, 
Classroom Environment, Instruction, and Professional 
Responsibilities.” 

Student growth includes two different metrics, 
value-added scores and performance tasks."® Student 
growth totals are combined in different ways, based on 
ateacher’s grade and subject. Most teachers get a value- 
added score—either an individual score if they teach 
in tested subjects and grades, or a school-wide literacy 
score if they teach in non-tested subjects or grades. 
Teachers also receive a score based on student growth 
on performance tasks, assessments created by teams of 
teachers, and district-level content specialists. There 
are grade- and subject-specific performance tasks 
covering virtually all grades and subjects. 

Evaluation scores carry consequences for both ten- 


ured and non-tenured teachers. A teacher’s summative 


REACH rating is tied directly to the district’s dismissal, 
remediation, and tenure attainment policies.!? 

In this report, we use 2013-14 data from a number 
of sources, including CPS administrative data, CPS 


REACH teacher evaluation data, and responses of stu- 


dents and teachers to the My Voice, My School Survey. 


CPS Schools 2013-14 


Elementary Schools 


High Schools 101 


Total Number of Schools 


Note: Only schools with REACH data in 2013-14 were included in our analyses 
and reflected in the numbers above. Charter schools do not utilize REACH and 
are not included in analyses. 


2013-14 REACH Evaluation Data 


REACH teacher evaluation data utilized in this report 
are from 2013-14 evaluation ratings of teachers.?° 
Analyses of observation scores include non-tenured 
teachers and tenured teachers with observation rat- 
ings from at least two observations during the 2013-14 
school year. The analyses only include teachers who 
were rated using the CPS Framework for Teaching. 
Librarians, counselors, clinicians, and other education 
support specialists rated on a different framework were 
not included in the analyses. Charter schools do not 


utilize REACH and are not included in our analyses. 


Number of Teachers with REACH Evaluation Scores 2013-14 


Teacher School Level 
and Tenure Status 


Elementary Non-Tenured Teachers 
Elementary Tenured Teachers 
High School Non-Tenured Teachers 


High School Tenured Teachers 


Number of Teachers with at 
Least Two Observations 


Number of Teachers with 
Individual Value-Added Scores 


OrS23) 
1,284 


3,763 


Note: Only teachers with ratings from at least two observations were included in our analyses of observations. In 2013-14, about 94 percent of teachers (out of 


19,098 total teachers in our dataset) had ratings from at least two observations. 


17 See Appendix B for the CPS Framework for Teaching. 
18 See Appendix A and Appendix B for more details on REACH. 
19 Chicago Public Schools (2014). 
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20 For more details about REACH measures and their correlations, 
see Jiang et al. (2014). 


Teacher Evaluation in Chicago 


CPS utilizes a variety of scales to report observation 
and value-added measures. In this report, for both value- 
added and observation scores, we utilize the 100 to 400 
point scale used by CPS for overall professional practice 
and student growth scores for ease of comparison. 

Throughout this report, observation scores refer to 
the overall professional practice scores for teachers, cal- 
culated using a weighted average of component ratings. 
Observation ratings refer to the four-level rating scale 
of the CPS Framework for Teaching (Unsatisfactory, 
Developing, Proficient, Distinguished).*' In 2013-14, 
observations comprised 75 percent of a teacher’s overall 
REACH score.?? 

Value-Added Scores refer to the individual value-add- 
ed scores received by teachers in grades 3-8, teaching 
reading or math. In this report, we utilize multi-year 
averages (from 2012-13 and 2013-14) if teachers had 
value-added scores from both years available. Value- 
added scores are based on student results from the 
NWEA-MAP test, an adaptive, computer-based test. 

Analyses of value-added scores in this report include 
only elementary teachers who received individual 
value-added scores. In 2013-14, high school teachers 
teaching ninth- through eleventh-grade tested subjects 
received an individual value-added score. However 
this value-added score only counted for 5 percent of 
their overall REACH score and, in 2014-15, no value- 
added scores were included in high school teachers’ 
REACH ratings. 

Student growth also includes a score based on 


performance tasks. These are written or hands-on 


assessments developed by teams of CPS teachers and 


Average Score 


Observations 48.28 
Reading Value Added 


Math Value Added 


40.78 
44.49 


Observation and Value-Added Scores 2013-14 


Standard 
Deviation 


central office staff and are designed to measure the 
mastery or progress toward mastery of a particular 
skill or standard. In both 2012-13 and 2013-14, we 
found little variation on teachers’ performance task 
scores?? and thus did not include analyses on perfor- 


mance tasks in this report. 


2013-14 CPS Administrative Data 


To examine equity of ratings across different types 

of schools, we aggregated CPS administrative student 
data to the school level, including students’ race, previ- 
ous achievement and their free/reduced-price lunch 
status. Measures of the economic status of students in 
schools were also derived from census data on students’ 
residential neighborhoods, measured at the census 
block group level. A measure of poverty captured the 
percent of adult males unemployed and the percent of 


families with incomes below the poverty line. 


2014 My Voice, My Schoo! Survey 


Chicago students in grades 6-12 and all teachers and 
principals have been responding to surveys developed 
by UChicago Consortium since 1991. Most of the survey 
content is organized around the Five Essentials for 
School Improvement.2* This report uses teachers’ 
perceptions of their school’s leadership, professional 
capacity, and parent support and students’ perceptions 
of instruction and learning climate from the 2014 

My Voice, My School survey as indicators of their 
school’s organizational strength. For teachers, the 
response rate was 81 percent; for students, the overall 


response rate was 79 percent. 


Number of 
Teachers 


Minimum Maximum 


100 
100 
100 


4,106 


Note: For both value-added and observation scores, we utilize the 100 to 400 point scale used by CPS for overall professional practice and student growth 
scores for ease of comparison. In this report, we utilized multi-year averages (from 2012-13 and 2013-14) if teachers had value-added scores from both years. 


21 See Appendix B for the CPS Framework for Teaching. 
22 See Appendix A for details on percentages of each measure. 
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23 Jiang et al. (2014). 
24 Bryk, Sebring, Allensworth, Luppescu, & Easton (2010). 


CHAPTER 1 


The Relationship between School 
Characteristics and REACH Scores 


In this chapter, we explore how teachers’ observation 
and value-added scores are distributed across schools. 
Prior research in Chicago and elsewhere has consistently 
shown that high-poverty schools tend to have more nov- 
ice teachers and fewer teachers with stronger qualifica- 
tions in terms of teacher test scores and certification.° 
Here, we investigate the relationship between school 
characteristics and teacher value-added and observation 
scores. In addition, we describe how teachers’ evaluation 
scores differ by their school’s organizational strength. 
We present results for elementary schools, as we 
can present both observations and value-added scores. 
Findings on high school observations were similar to those 


for elementary schools and can be found in Figure D.3. 


Teachers with the lowest value-added and observation 
scores are overrepresented in schools that serve the most 
disadvantaged students. On both value-added and obser- 
vation scores, teachers with the lowest scores are overrep- 
resented in schools serving the highest concentration of 
students in poverty. Figures 1.A and 1.B show how teach- 
ers with the highest and lowest scores on value added and 
observations are distributed among schools of varying 
poverty levels. If teacher ratings were evenly distributed 


across schools at all levels of student poverty, we would 


expect 20 percent of teachers with the lowest ratings to be 
at each level of student poverty; we would also expect that 
20 percent of teachers with the highest ratings would be 
in schools at each level of student poverty. 

Teacher scores are not evenly distributed across 
schools. The lowest-scoring teachers on both observa- 
tion and value-added measures are more likely to be 
in schools with the greatest concentration of poverty. 
For example, of the teachers with the lowest scores on 
value added, 26 percent are in schools with the highest 
levels of poverty while only 13 percent are in schools 
with lowest concentration of poverty (see Figure 1.A). 

The differences in observation scores by school 
poverty level are more pronounced. Of the teachers 
with the lowest scores on observation ratings, 30 per- 
cent are in schools with the highest poverty while 
only 9 percent are in schools with lowest poverty 


(see Figure 1.B).7° 


Teachers with the highest scores on observations are 
vastly underrepresented in highest-poverty schools; 
teachers with highest scores on value added are 
more evenly distributed across schools. Next we turn 
our attention from the lowest-scoring teachers to the 


highest-scoring teachers—those with the top 20 percent 


Why Might We See Differences between Observations and 


Value-Added Scores? 


Observation ratings are meant to capture a teacher’s 
level of instructional practice in a classroom. An evalu- 
ator observes a classroom and captures evidence of 
a teacher’s practice. This evidence is then utilized to 
assign ratings. Observation ratings do not control for 
any student or school characteristics. 

Value-added estimates are intended to capture 


students’ growth on test scores. Value-added esti- 
mates are constructed so that they explicitly control 
for measurable student characteristics such as gender, 
race/ethnicity, economic disadvantage, English learner, 
disability, and mobility, as well as prior test scores. This 
is meant to compare a teacher’s students to similar 
students districtwide. 


25 Lankford et al. (2002); Clotfelter et al. (2005); Goldhaber (2015); 
DeAngelis et al. (2005). 


26 All analyses on observation scores conducted in this report were 


also replicated using the subsample of teachers with individual 
value-added scores. Results were similar to the full sample of 
teachers. 
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FIGURE 1 


Teachers with Lowest Value-Added and Observation Scores are Over Represented in Schools Serving the 
Most Disadvantaged Students 


FIGURE 1.A 
Distribution of Teachers with Highest and Lowest Value-Added Scores by School Poverty Level 
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FIGURE 1.B 


Distribution of Elementary School Teachers with the Highest and Lowest Observation Scores by School Poverty Level 
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Note: We ranked elementary and high schools by the percentage of students receiving free/reduced-price lunch. After ranking schools, we divided them into five equal-sized 
groups (quintiles), with the first quintile representing the lowest-poverty schools and the fifth quintile the highest-poverty schools. Both figures represent elementary school 
teachers only. Value-added scores include teachers in grades 3-8 reading and math only. There were 1,089 teachers with lowest value-added scores and 915 teachers with 
highest value-added scores. There were 2,609 teachers with lowest observation scores and 2,594 teachers with highest observation scores. 


Measures of School Characteristics 


School-level measures of student economic disad- low-income and other students. 

vantage, minority students, and prior achievement Relationships between teacher evaluation scores 
are highly related. In this report, we present relation- and school-level percentages of minority students 
ships between REACH scores and school-level and achievement were similar to our findings with 
poverty because it is one of the biggest determinants school-level poverty. Examples of these are provided 
of student outcomes and many policy interventions in Appendix C. 


are motivated to reduce gaps in performance between 
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Similar Findings in States and Districts Across the Nation 


Chicago is not the only district that has found relationships between teacher evaluation scores and school or stu- 


dent characteristics. Districts and states across the nation are finding similar trends. These include the following: 


* Washington, DC: Teachers in low-poverty schools 
had higher IMPACT results than teachers in 
medium- and high-poverty schools.” 


¢ Minneapolis: Schools with student populations 
of higher poverty had larger concentrations of 
teachers with scores below average.® 

* Washington State: Measures of teacher qual- 


ity, such as experience, licensure exam scores, 
and value-added estimates of effectiveness, are 


A Education Consortium for Research and Evaluation (2013). 
B_ Matos (2015). 


inequitably distributed across every indicator of 
student disadvantage—free/reduced-price lunch 
status, underrepresented minority, and low prior 
academic performance.© 


¢ Florida and North Carolina: The average effective- 
ness of teachers as measured by value added in high- 
poverty schools is, in general, less than teachers at 
other schools, but only slightly. High-poverty schools 
also have much larger within-school variation.” 


C Goldhaber et al. (2015). 
D Sass, Hannaway, Xu, Figlio, & Feng (2010). 


of observation scores. Only 6 percent of teachers with the 
highest observation scores teach in the highest-poverty 
schools, while 34 percent are in schools with lowest 
poverty. Thus, in the highest-poverty schools, not only 
are students more likely to be taught by teachers with 
the bottom scores on observations, they are also the least 
likely to be taught by teachers who have the top scores. 
However, the prevalence of teachers with highest 
value-added scores is about even across schools with 
differing concentrations of economically disadvantaged 
students. As Figure 1.A shows, of teachers with top scores 
on value added, 20 percent teach in the lowest-poverty 


schools and 17 percent teach in highest-poverty schools.?7 


Differences in evaluation scores between high- and 
low-poverty schools persist even after controlling for 
differences in teacher experience and credentials. 
Differences in teacher credentials or experience levels 
between teachers in high- and low-poverty schools is 
one possible explanation for why teachers with lower 
evaluation scores tend to be in high-poverty schools. 
In CPS in 2013-14, higher-poverty schools had higher 


percentages of new teachers and lower percentages of 


27 Value-added scores displayed here depict individual value- 
added scores from 2012-13 and 2013-14 because multi-year 
averages were utilized where available. Individual value-added 
scores include combined scores for teachers teaching math 
and/or reading in grades 3-8. For distributions of teachers 


teachers with National Board Certification.2® However, 
our findings in Table 1 show that teacher experience 
and credentials do not explain the distribution of 
teacher observation and value-added scores among 
schools, as background characteristics account for only 
asmall amount of the difference in both value-added or 


observation scores. 


On value-added, top-scoring teachers in highest- 
poverty schools have higher scores than their 
counterparts in lower-poverty schools. On average, 
teacher value-added tends to be greater in low-poverty 
schools than in high-poverty schools. This average 
hides some surprising differences. Looking closer at the 
distribution of scores within higher- and lower-poverty 
schools, we find that the teachers with the highest value 
added in the high-poverty schools out score teachers 
with the highest value added in lower-poverty schools 
(see Figure 2). 

There are distinct differences when comparing top, 
bottom, and middle scoring teachers in higher-poverty 
schools to their counterparts in lower-poverty schools. 


The teachers with the smallest value added among 


with highest and lowest scores, broken down by reading and 
math value added, see Appendix D. 

28 See Appendix E for percentages of first-year teachers and 
NBCTs by school poverty level. 
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TABLE 1 
Average Evaluation Scores by School Poverty Level 


School Average Controlling Average Controlling Average Controlling 

Poverty Score (SD) for Teacher Score (SD) for Teacher Score (SD) for Teacher 

Level Background Background Background 
1-Lowest 332 (42) 331 256 (34) 256 251 (41) 251 
2 312 (49) 308 254 (41) PMS) 251G7) ASye 
5 312 (48) SO5 256 (45) 258 ASS (SD) 256 
4 304 (48) 298 247 (48) 250 248 (49) 25) 
5-Highest 289 (44) 288 246 (51) 247 248 (54) 249 


Note: We ranked elementary and high schools by the percentage of students receiving free/reduced-price lunch. After ranking schools, we divided them 
into five equal-sized groups (quintiles), with the first quintile representing the lowest-poverty schools and the fifth quintile the highest-poverty schools. 


The numbers in the “Controlling for teacher background” column represent average scores of teachers controlling for teacher characteristics including experi- 
ence, advanced degrees, National Board Certification, and gender. This allows us to control for differences in teacher composition at schools. For example, some 
schools may have more first-year teachers than other schools. 


FIGURE 2 


Top-Scoring Teachers in Highest-Poverty Schools Have Higher Value-Added Scores than Their Counterparts 
in Lower-Poverty Schools 
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How to Read Figure 2: In each graph, the purple bar on the right of each pair of bars represents the average scores of teachers in highest-poverty schools along each 
quintile of the distribution of teacher scores, while the orange bar on the left of each pair of bars represents the average scores of teachers in the lowest-poverty 
schools. For example, on each graph, the orange bar at the very left (quintile 1) is the average score of teachers in lowest-poverty schools with the bottom 20 percent 
of scores and the purple bar to the right is the average score of teachers in highest-poverty schools with the bottom 20 percent of scores. On each graph, the purple 
bar at the very right (quintile 5) represents the average score of teachers in the highest-poverty schools with the top scores among teachers in highest-poverty schools 
and the orange bar to the left of it represents the average scores of teachers in the lowest-poverty schools. 


On observation scores, the difference between average scores of teachers is about 40 points along all quintiles. The gap between teachers’ scores in the highest- and 
lowest-poverty schools is about the same for lowest-, highest-, and middle-scoring teachers. On value added reading and math, we see a gap in scores between 
teachers in higher- and lower-poverty scores only among the lower-scoring teachers. This gap is smaller than what we see on observations and not only does it narrow 
as one moves up the distribution of teacher scores, it disappears. In fact, on value added, the highest-scoring teachers in the highest-poverty schools have on average 
higher scores than their counterparts in lower-poverty schools. 


Averages along the distribution were calculated separately for highest- and lowest-poverty schools. 
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teachers in lower-poverty schools (bottom 20 percent) 
score better than their counterparts in the higher- 
poverty schools. This narrows, closes, and reverses as 
one moves up the distribution of teacher value-added 
scores. On both reading and math value added, the 
top-scoring teachers in highest-poverty schools have 
higher scores than their counterparts in lower-poverty 
schools. Thus on value added reading and math, the 
average differences in scores between higher- and 
lower-poverty schools are driven by the lower scores 
of teachers at the lower end of the distribution. 

On observation scores, the difference between aver- 
age scores of teachers at low-poverty versus high-pov- 
erty schools is about 40 points (of the 100 to 400 point 
scale) along all quintiles of teachers. In other words, 
the teachers in the top 20 percent of observation scores 
in higher-poverty schools still score substantially less 
than teachers in the top 20 percent among teachers in 


low-poverty schools on the observation rubric. 


In some schools, almost all observation ratings 
assigned are Proficient or Distinguished; in others, 
half are Basic or Unsatisfactory. There is a moderate 
negative correlation (-0.33) between observation scores 
and the concentration of economically disadvantaged 
students in schools.?9 On average, as the percentage 
of low-income students in a school increase, observa- 
tion scores decrease. Observation scores are comprised 
of ratings of teaching practice assigned by evalua- 
tors and based on the standards described in the CPS 
Framework. There are four such rating categories: 
Distinguished, Proficient, Basic, and Unsatisfactory. 
Figure 3 shows the percentage of each observation rat- 
ings in each school in 2013-14. In some schools, most 
ratings assigned were Distinguished (on the left end of 
the figure) and in some schools most ratings were Basic 
or lower (on the right end of the graph). 

The schools with lower percentages of Proficient 
and Distinguished ratings on Figure 3 tend to be 
higher-poverty schools. In fact, about 63 percent of 


schools with the lowest percentages of Proficient and 


29 See Table C.4 in Appendix C for correlations. 
30 See Appendix B for details on these domains. 


Should All Schools Have the 


Same Percentages of Ratings? 


Teachers are not randomly assigned to schools. 
Therefore, we would not expect all schools to 
have the exact same percentages of assigned 
Distinguished or Unsatisfactory ratings, nor the 
exact same distribution on value added. 

Some schools might have a high percentage 
of Distinguished teachers through recruitment or 
professional development efforts, or some schools 
might have a high percentage of teachers with 
Basic ratings because they have many teachers 
who are struggling or high rates of teacher attrition. 


Distinguished ratings were higher-poverty schools 
(schools in the top two quintiles of school poverty). 

The relationship between school poverty and 1S 
observation ratings is similar when the ratings are 
broken down into each of the four domains of the CPS 
Framework for Teaching.3° This moderate relationship 
of observation ratings with school poverty is not driven 
by one particular domain having a stronger relation- 
ship with school poverty.*" In other words, teachers 
in low-poverty schools tend to have higher ratings on 
all aspects of teacher practice that are measured with 
the observation protocol—Planning and Preparation, 
Classroom Environment, Instruction, and Professional 
Responsibilities—compared to teachers in high-poverty 
schools. It is not just one domain of the rubric driving 


these relationships. 


On both observations and value-added metrics, 
most of the variation in scores was among teachers 
within the same school, rather than across schools. 
The associations between observation and value-added 
scores and school-level poverty reflect overall averages 
of school-level scores. However, as shown in Figure 3, 
within most schools, there are some Distinguished rat- 
ings, some Proficient ratings, some Basic ratings, and 


some Unsatisfactory ratings. 


31 Domain averages are strongly correlated with each other. 
See Table C.4 in Appendix C for correlations for each domain. 
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FIGURE 3 


Schools Varied in the Percentages of Unsatisfactory, Developing, Proficient, and Distinguished Ratings Assigned 


in 2013-14 


School-level Distributions of Reach Observation Ratings 
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Note: Each bar represents a school and the percentage of Unsatisfactory, Basic, Proficient, and Distinguished ratings assigned to teachers in the school in the 2013-14 
school year. Schools with fewer than 10 teachers were excluded. Includes only teachers rated on the teacher framework. Percentages are from 519 schools, 19,087 


teachers and 6,069,069 observation ratings on 19 components. 


In fact, as is the case in the observation ratings, there 
are greater differences in teachers’ observation scores 
among teachers in the same school than there are among 
teachers in different schools. About two-thirds of the total 
difference across teachers’ observation scores is due to 
differences among teachers in the same schoo]; this is the 
case for almost three-quarters of the difference among 
teachers’ value-added scores. In any given school—high- 
or low-poverty—there are teachers with very high, low, 
or average scores on observations and teachers with very 


high, low, or average scores on value added. 


In schools with stronger culture and climate, teachers 
have higher scores on both value added and observa- 
tions. Prior research has suggested that it might be easi- 
er to be an effective teacher in a school that has a strong 
climate and organization—where school leadership is 


strong and trusted, teachers feel committed to their 


32 Sebastian & Allensworth (2012). 

33 Controlling for school-level characteristics such as poverty 
and prior achievement. See Appendix F for complete 
regression tables and details on analysis. 


job, and students show strong academic behaviors.** 
We examined relationships between the climate and 
culture of schools and the observation and value-added 
scores of teachers in those schools utilizing student 

and teacher perceptions of climate and culture from 
the district-wide My Voice, My School surveys from the 
spring of 2014. In schools with stronger professional 
climate—as perceived by teachers—and stronger learn- 
ing climate and instruction—as perceived by students— 
teachers tend to have higher scores on both value-added 
and observation scores.?? Figure 4 depicts the relation- 
ships between a few key measures of school climate and 
culture and teachers’ observation, reading value-added 
and math value-added scores.34 For example, the dif- 
ference in teachers’ observation scores between schools 
with average school commitment and schools with very 
strong school commitment is 8.1 points.*> On math 


value added the difference is 3.1 points, and on reading 


34 See Appendix F for more information about our surveys and 
our measures of school climate and culture. 

35 Coefficients represent a one standard deviation difference in 
a school’s climate measure. 
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FIGURE 4 
Relationship between Measures of School Climate and REACH Scores 
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Note: Results are coefficients from regressions run separately to avoid collinearity issues and depict a one standard deviation change on measures of school climate 
and culture. A one standard deviation difference on a measure is similar to comparing a “Below Average” to an “Average” school on that measure or comparing an 
“Average” school to an “Above Average” school on that measure. We analyzed many measures of school climate and present four here as examples. Results from all 
measures can be found in Table F.1 in Appendix F. 


value added the difference is 4.9 points.7® facilitates strong teaching, whether schools with strong 
While relationships between evaluation scores and culture are better at recruiting effective teachers, or 

measures of culture and climate are significant, what whether high-scoring teachers themselves are creating 

remains unknown is whether a strong school culture an environment with strong culture and climate. 


36 For both value-added and observation scores, we utilize the 
100 to 400 point scale used by CPS for overall professional 
practice and student growth scores. 
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CHAPTER 2 


Relationship between Teacher 


Characteristics and REACH Scores 


In this chapter, we investigate the degree to which 
teachers’ credentials (years of experience, advanced 
degree attainment, and National Board Certification) 
are associated with differences in their performance as 
measured by both value-added and observation scores. 
In addition, we examine the relationships between 
teacher characteristics, such as a teacher’s gender or 
race/ethnicity and evaluation scores. As discussed in 
the introduction, there are growing concerns about the 
changing composition of the overall teacher workforce 
in Chicago and elsewhere, as it has become whiter and 
less experienced over time. There are also far fewer 
male teachers in the district than female teachers (see 
Table 2 for percentages of CPS teachers by race/ethnic- 
ity and gender). Previous research has shown that there 
are benefits for students when they have teachers like 
themselves, in terms of race and gender.?” Evaluation 
scores are tied to high-stakes decisions such as dis- 


missal, tenure attainment, and remediation. Thus, 


TABLE 2 
2013-14 CPS Teachers Race/Ethnicity & Gender 


Number Percentage 
of Teachers of Teachers 


White 9,356 49% 
African American 4,163 22% 
Latino 3,606 19% 
Other* 1,403 7% 
Race/Ethnicity 510 3% 
Unknown 

Female 14,286 77% 
Male 4,242 23% 
National Board 1,462 8% 
Certified Teachers 

Teachers with 6,195 33% 
Advanced Degrees 


*Other includes Asian, Hawaiian/Pacific Islander, Native American, and 
multi-racial. 


37 Dee (2005); Ferguson (2003); Egalite et al. (2015). 


there may be implications for the future diversity of the 
teacher workforce if such scores are related to teacher 
characteristics. 

In Chapter 1, we found there were differences on 
evaluation scores by school characteristics. In Chapter 
2, we present results adjusted for the schools teach- 
ers work in, as we want to understand differences by 
teacher characteristics and not school characteristics. 
However, because differences in scores that have not 
been adjusted for school characteristics may have 
meaningful consequences for teachers, we discuss 
unadjusted scores when they are substantially 


different from scores adjusted for school effects. 


Years of Experience, Credentials, 
and Advanced Degrees 


Teachers with more experience have slightly higher 
scores on both observations and value added in 
comparison to teachers in their first year of teaching. 
Comparing teachers within the same schools, there are 
only slight differences in value-added scores between 
teachers with few years of teaching experience and 
teachers with more years of experience. An exception is 
first-year teachers who have lower scores, on average, 
than teachers in any other experience category (see 
Table 3). On reading value-added, first-year teachers’ 
scores were about seven points lower than teachers 
with 2-5 years of experience; on math value added the 
difference was about three points. 

There are larger differences in observation scores 
between teachers with one year of experience and 
teachers with two years than there are in value-added 
scores. First-year teachers, on average, score 17 obser- 
vation score points lower than teachers with 2-5years 
of experience. Thus, first-year teachers receive signifi- 


cantly lower ratings on both observations and value- 
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TABLE 3 
Years of Experience 


Average Scores Controlling 
for School Effects and Other 
Teacher Characteristics 


Experience Observation Reading Math 
in the District Score Value- Valued- 
Added Added 
Score Score 
1 Year 245.4 245.7 
2-5 Years 251.9 248.5 
6-10 Years 252.0 252.4 
10-15 years 255.1 255.9 
>15 Years 252.5 250.8 


Note: The numbers on this table are predictions from a school fixed-effects 
regression model, representing comparisons of teachers with different 
experience levels who teach at the same school, controlling for National Board 
Certification, gender, and race/ethnicity. Bold indicates statistical significance 
(p>0.05) in comparison to first-year teachers. More details and unadjusted 
averages are available in Appendix G. 


added scores than teachers with more than one year of 
experience. 38 

There are smaller differences among groups of 
teachers with more than one year of experience; 
teachers with more than 15 years of experience have 
slightly lower scores on both observation and value- 
added scores in comparison to teachers with 2-15 years 
of experience in the district. Using this data, we cannot 
say if these differences reflect teachers changing as 
they gain more experience, or teachers differing by 


cohort, or selective attrition among teachers.*? 


38 Because first-year teachers are more likely to be teaching in 
high-poverty schools (see Appendix E), the adjustments for 
school factors may mask the overall difference in ratings for 
first-year teachers compared to others across the district— 
not just compared to teachers in the same school. In fact, 
we do find unadjusted differences in observation scores 
between first-year teachers and those with more experience 
to be slightly greater than the differences adjusted for school 
effects. On average, the unadjusted first-year teachers’ 
observation scores are 24 points lower than those of teachers 
with 2-5 years of experience, while their reading value-added 
scores and math value-added scores are six and two points 
lower, respectively (see Table G.1 in Appendix G). 

39 These data are cross-sectional. They denote teacher evalua- 
tion scores and years of experience in the district for 2013-14 
and are not a comparison over time. Thus, we cannot and do 
not draw conclusions about teacher improvement over time, 


National Board Certified teachers have higher observa- 
tion scores but no significant differences on value add- 
ed compared to those without this credential. National 
Board Certification is a national teaching credential 
established by the National Board for Professional 
Teaching Standards (NBPTS) to signify the accom- 
plishment of a high level of professional teaching.*° 
Studies that explore the relationship between National 
Board Certification and value-added scores have been 
mixed; some studies have found evidence of small but 
significant effects of National Board Certified Teachers 
(NBCTs) on student achievement gains,“’ while other 
studies have found no evidence of differences between 
NBCT and non-NBCTs on value-added measures.*? 

One study related observation scores to National Board 
Certification; scores were significantly higher for math 
teachers with board certification than those for teach- 
ers without that credential, but this difference was not 
significant in other subjects.*% 

REACH observation scores for NBCTs were 17 points 
higher than non-NBCTs, controlling for other teacher 
characteristics, such as tenure and advanced degree, 
and adjusting for school effects. However, there were 
no significant differences between NBCTs and non- 
NBCTs on value-added scores in either reading or math 


(see Table 4).44 


as currently we have not accounted for attrition from the 
district or differences in teacher cohorts over time. 

40 Teachers must submit extensive portfolios and pass a 
number of assessments. There is a 48 percent passage rate 
for those attempting certification. http://www.nbpts.org/ 

41 Goldhaber & Anthony (2007); Crofford, Pederson, & Garn 
(2014); Cowan & Goldhaber (2015); Cavalluzzo (2004). 

42 Clotfelter et al. (2006); Harris & Sass (2011); Cantrell, 
Fullerton, Kane, & Staiger (2008). 

43 Cavalluzzo, Barrow, & Henderson (2014). 

44 On average, unadjusted for school effects, or teacher 
characteristics such as years of experience, the observation 
scores of NBCTs are 29 points higher than those of non- 
NBCTs, while there are no significant differences in both 
reading value-added scores and math value-added scores 
between NBCTs and non-NBCTs. See Table G.2 in Appendix G 
for raw averages. 
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Teachers with advanced degrees have slightly higher 
observation scores than those without; there are no 
significant differences in their value-added scores.*> 
Research on the relationship between advanced degrees 
and value-added scores has found that having a higher 
degree is not associated with higher student achieve- 
ment gains.*® Teachers in CPS with advanced degrees 
scored five points higher on observation scores than 
those without when comparing teachers within the 
same school and controlling for other teacher charac- 
teristics. However, there were no significant differences 
between teachers with and without advanced degrees 


on either math or reading value added (see Table 4). 


Teacher Gender and Race/Ethnicity 
Previous studies relating teacher gender or race to their 
evaluation score are sparse as many districts are just 
beginning to generate new evaluation metrics. It is also 
difficult to parse whether differences by race/ethnic- 
ity and gender are due to differences in the classes or 
characteristics of students assigned to teachers of dif- 


ferent races or genders.*” In our analyses, we are able 


TABLE 4 
National Board Certification & Advanced Degrees 


Average Scores Controlling 


for School Effects and Other 
Teacher Characteristics 


Observation Reading Math 
Scores Value- Value- 
Added Added 
Score Score 
Non-NBCTs SO9r2 PSS) 25 inl 
NBCTs 2 ono) 25270 252e/ 
BA Only 307.4 DSO) 250 
Aevanced 312.0 252.4 251.7 
Degree 
Note: The numbers on this table are predictions from a school fixed-effects 
regression model representing comparisons of teachers with different creden- 
tials who teach at the same school, controlling for other teacher characteristics 
such as years of experience, gender, and race/ethnicity. Bold indicates statisti- 


cal significance (p>0.05) in comparison to the excluded group (non-NBCTs or 
BA only). More details and unadjusted averages are available in Appendix G. 


45 Currently we have not analyzed the specific fields of study 
of advanced degrees. Here, advanced degrees refer to any 
teachers with master’s or doctoral degrees. 


to adjust for the differences in schools in which teach- 
ers work, but not for the any differences in students or 


classes they are assigned within schools. 


Male teachers have lower observation and value-added 
scores in comparison to female teachers. Male teachers 
had lower observation and value-added scores in com- 
parison to female teachers with the same level of experi- 
ence and credentials. Unadjusted for school effects, male 
teachers’ observation scores were about 11 points lower, 
2 points lower on math value added, and 5 points lower 
on reading value added (on a 100 to 400 point scale).*8 
There is little difference between average scores 
that have and have not been adjusted for school effects. 
Differences between male and female teachers on both 
observations and value-added scores remain signifi- 
cant after adjusting for school effects. On average, male 
teachers scored 12 points lower than female teachers on 
observations, about 4 points lower on math value added, 
and about 5 points lower on reading value added than 
their female counterparts within the same schools (see 


Table 5). 


TABLE 5 
Average Scores by Teacher Race/Ethnicity & Gender 


Average Scores Controlling 
for School Effects and Other 
Teacher Characteristics 


Observation Reading Math 
Scores Value- Value- 
Added Added 
Score Score 
Female Sel 252.6 25272) 
Male 301.6 247.9 247.7 
White 314.5 2522) PNAS) 
fue 304.5 251.4 249.1 
American 
Latino 308.0 25159 252.4 
A: 307.0 252.4 255.0 
Note: The numbers on this table are predictions from a school fixed-effects 
regression model representing comparisons of teachers with different creden- 
tials who teach at the same school, controlling for other teacher characteristics 
such as years of experience, gender, and credentials. Bold indicates statistical 
significance (p>0.05) in comparison to the excluded group (non-NBCTs or 


BA only). More details and unadjusted averages are available in Appendix G. 


46 Clotfelter et al. (2006). 
47 Paufler & Amrein-Beardsley (2013). 
48 See Table G.3 in Appendix G for unadjusted averages. 
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Teachers from racial/ethnic minorities have lower 
observation scores than white teachers, but no 
significant differences on value added. As we have dis- 
cussed in Chapter 1, observation scores and the level of 
student poverty at aschool are related. The percentage 
of minority teachers and school-poverty level are also 
related. As the concentration of low-income students 
within a school increases, so does the percentage of 
teachers from racial/ethnic minorities. In lowest- 
poverty schools, 30 percent of teachers are from racial/ 
ethnic minorities; in highest-poverty schools, 66 per- 
cent of teachers are from racial/ethnic minorities (see 
Figure 5). The differences are especially distinct for 
African American teachers. In highest-poverty schools, 
almost 40 percent of teachers are African American; in 
the lowest-poverty schools, only 8 percent of teachers 
are African American. 

If we do not take into account the differences in 


the types of schools in which teachers teach, African 


FIGURE 5 


Percentage of Teachers by Teacher Race/Ethnicity 
and School Poverty Level 
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Note: Other includes Asian, Hawaiian/Pacific Islander, Native American, and 
multi-racial. 


49 See Table G.3 in Appendix G for unadjusted averages. 

50 We utilized a school fixed-effects model to compare teachers 
within the same school across all schools. See Appendix G 
for details. 


American teachers’ observation scores were about 
30 points lower on the observation scale (which ranges 
from 100-400 points), Latino teachers’ observation 
scores were about 8 points lower, and other minority 
teachers were 10 points lower than non-minority teach- 
ers with similar levels of experience and credentials.*? 
However, the unadjusted differences do not account 
for the relationships between the percentage of minor- 
ity teachers ina school and school characteristics. After 
adjusting for differences in schools, the differences in 
observation scores between teachers of racial/ethnic 
minorities and white teachers get smaller—especially 
for African American teachers. African American 
teachers scored 10 points lower than white teachers 
with similar levels of experience and credentials teach- 
ing in the same schools.°° Latino teachers and other 
minority teachers scored about 7 points lower than 
white teachers with similar levels of experience teach- 


ing in the same schools (See Table 5 on p.19). 


Differences between African American teachers and 
white teachers’ observation scores are largely driven 
by differences in schools; there are no significant 
differences in their value-added scores. The substan- 
tial difference between scores adjusted and unadjusted 
for school effects shows that a large proportion of 

the difference in observation scores between African 
American and white teachers is due to the substantial 
relationship between observation scores and school 
characteristics, such as school-level poverty, and the 
fact that African American teachers are overrepresent- 
ed in the highest-poverty schools and underrepresented 
in the lowest-poverty schools. 

There were no significant differences by teacher 
race/ethnicity on either reading or math value-added 
scores. This contrasts with what we find on observa- 
tion scores. One possible explanation for the remain- 
ing differences between minority and white teachers 
within the same schools may be due to racial differ- 


ences between the evaluator and teacher. However, in 


Chapter 2 | Relationship between Teacher Characteristics and REACH Scores 


CPS most African American teachers’ evaluators were 
also African American. For example in 2013-14, about 
75 percent of African American teachers’ observations 
were conducted by African American evaluators.®! 
Another possible explanation is that there are differ- 
ences in the students assigned to minority and white 
teachers within the same schools. As discussed previ- 


ously, value-added measures explicitly control for 


51 About 45 percent of white teachers’ observations and 42 
percent of Latino teachers’ observations were conducted 
by an evaluator of the same race/ethnicity. 

52 Kalogrides, Loeb, & Beteille (2012). 


student characteristics such as economic disadvantage 
and previous achievement. There is research evidence 
that within schools minority teachers are assigned 
lower-achieving students than their white colleagues.>” 
Thus it is possible the remaining differences on observa- 
tion scores between minority and white teachers may be 
due to differences in the classrooms assigned to teach- 


ers within schools. 
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CHAPTER 3 


Interoretive Summary 


In this study, we find that schools with higher con- 
centrations of disadvantaged students tend to have 
teachers with lower observation scores and teachers 
with lower value-added scores. This relationship is 
much more pronounced for observation scores than it 
is for the individual value-added scores of teachers in 
tested grades and subjects. We also find that schools 
with stronger culture and climate have higher teacher 
evaluation scores, even in schools with similar levels of 
poverty. 

In addition, we find some differences in the relation- 
ships between evaluation scores and individual teacher 
background characteristics. Teachers in their first year 
of teaching typically have lower value-added and obser- 
vation scores than their more experienced colleagues. 
NBCTs have higher observation scores than their col- 
leagues as, to a lesser degree, do those with master’s 
degrees; however, the value-added scores of NBCTs and 
of those with master’s degrees are not different than 
those without such credentials. We also found that male 
teachers receive lower value-added and observation 
scores than female teachers. Teachers from racial/ethnic 
minorities receive lower observation scores than white 
teachers, although a large proportion of these differenc- 
es can be attributed to the differences in characteristics 
of the schools in which they teach. There is no difference 
in value-added scores between teachers from racial/ 
ethnic minority groups and white teachers. 

We find observations have a much stronger relation- 
ship with school characteristics and some teacher char- 
acteristics than value added. Evaluation systems use 
multiple measures in order to capture different aspects 
of teacher performance. Value-added scores explicitly 
control for student characteristics such as previous 
achievement, student poverty, and mobility to compare 


ateacher’s students to similar students district-wide. 


53 See box entitled Similar Findings in States and Districts 
Across the Nation on page 11. 
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Observations do not control for student characteris- 
tics; in fact the observation rubric utilized in Chicago 
(and in many districts) relies on evidence of not only 
the teacher’s instruction but also of the students’ 
engagement with that instruction. This evidence may 
reflect classroom characteristics—such as the previous 
achievement level of students in the class —and school 
wide policies—such as discipline and attendance out- 
side of a teacher’s control. This difference may help to 
explain why observation scores may have a stronger 
relationship with school characteristics and some 


teacher characteristics than value added. 


This Report Leaves Many 
Unresolved Questions 


This report presents a descriptive analysis of the relation- 
ships between teacher evaluation scores, their character- 
istics, and the context in which they teach, but it does not 
draw conclusions about why these relationships exist. We 
do know that these patterns are found not only in Chicago 
but also in districts across the nation.5? We conclude by 
discussing critical gaps in the knowledge base on teacher 


evaluation and the questions raised by our findings. 


1. On average, teachers at high-poverty schools receive 
lower observation scores than teachers at low-poverty 
schools. To what extent does this reflect true differences 
in teacher effectiveness or the sensitivity of observation 
scores to classroom or school contexts? 

Since, on average, teachers at high-poverty schools 
receive lower observation scores than teachers at low- 
poverty schools, one key question that this report raises 
is whether they are actually providing less-effective 
instruction or whether observation based measures of 
teacher performance are influenced by the students 


they teach or the schools they serve. 


NM 


WGN 


It is certainly possible that the lower observation 
scores in high-poverty schools indicate that students 
in those schools receive lower-quality instruction than 
students in schools with fewer disadvantages. The 
hallmark of the Distinguished rating, the highest level 
of practice on the Chicago Framework, is that teach- 
ers have been able to create acommunity of learners in 
which students assume a large part of the responsibil- 
ity for the success of a lesson and their own learning.*4 
However, past research suggests that students in many 
higher-poverty schools spend more time sitting and 
listening to scripted instruction rather than discuss- 
ing, debating, and sharing ideas.55 Furthermore, 
accountability pressures may increase focus on test 
preparation.®® For over two decades, Martin Haberman 
has noted that a “pedagogy of poverty” exists in some 
high-poverty urban schools, where students experience 
a tightly controlled routine of teacher direction and 
student compliance—a pedagogy that is far different 
from “the questioning, discovering, arguing, and collabo- 
rating” that is more common among students in lower- 
poverty schools.5” 

In addition, teachers tend to prefer to work in 
schools where they are more likely to be effective.*® 
Schools with more disadvantaged students have more 
difficulty recruiting teachers and experience higher 
turnover.>? Thus over time, more effective teachers 
may be moving to lower-poverty schools.®° 

However, observation scores may also reflect the 
characteristics of students and schools. Teaching is 
an interaction among the teacher, the students, and the 
content within the context of schools.®' The Danielson 
Framework for Teaching (on which Chicago’s obser- 
vation framework is based) is intended to explicitly 
capture classroom interactions; ratings rely on evidence 
of both the teacher’s instruction and on the students’ 


reactions to and engagement with that instruction. It 


54 Danielson (2011). 

55 Smith, Lee, & Newmann (2001); Diamond & Spillane (2004); 
Anyon (1980); Knapp, Turnbull, & Shields (1990). 

56 Diamond & Spillane (2004); Taylor, Shepard, Kinner, & 
Rosenthal (2001). 

57 Haberman (1991); Haberman (2010). 

58 Simon & Moore-Johnson (2015); Lankford et al. (2002); 
Hanushek, Kain, & Rivkin (2001). 


Chapter 3 | Interpretive Summary 


is possible that the same teacher teaching the same 
lesson with the exact same instructional techniques 
may have very different results with one classroom of 
students versus another; those differences could result 
in different observation scores in the two classrooms. 
Furthermore, observation scores take place in a larger 
context: They may reflect that it is harder to get high 
scores in unorganized, chaotic schools or in schools with 
few resources or instructional supports for teachers, 

as other studies have shown.®? There is emerging re- 
search evidence that finds characteristics of a teacher’s 
students are related to a teacher’s observation ratings; 
observation scores tend to be lower in classrooms of 
students with lower previous achievement.®* One par- 
ticular study concluded this is the result of bias in the 
observation scores against teachers who are assigned 
less able and prepared student and recommended statis- 
tically adjusting observation scores for the background 


characteristics of the students in the classroom.®4 


2. Why do we see differences in observation scores 
by gender, race, and ethnicity? 

Asecond important question that this report does 

not answer is why, on average, minority teachers get 
lower observation scores than their white colleagues, 
even though we see no significant differences in their 
value-added scores. We see that a substantial propor- 
tion of the difference on the observation scores between 
white and minority teachers is due to minority teachers 
disproportionately teaching in high-poverty schools 
since, on average, teachers in high-poverty schools get 
lower observation scores than teachers in low-poverty 
schools. This report does not address the extent to 
which there may be further systematic differences in 
how students are assigned to teachers, which may also 
help explain these differences. For example, minority 


or male teachers may be assigned more students with 


59 Boyd, Grossman, Lankford, Loeb, & Wyckoff (2008); Allensworth, 
Ponisciak, & Mazzeo (2009); Goldhaber, Gross, & Player (2010); 
Guarino, Santibahez, & Daley (2006); Hanushek, Kain, & Rivkin 
(2001); Scafidi, Sjoquist, & Stinebricker (2007). 

60 Kalogrides & Loeb (2013). 

61 Cohen, Raudenbush, & Ball (2003). 

62 Cohen, Mccabe, Michelli, & Pickeral (2009); Hargreaves (1997). 

63 Whitehurst et al. (2014), Steinberg & Garrett (forthcoming). 

64 Whitehurst et al. (2014). 


discipline issues, or with previous low achievement or 
special education status. If so, it is possible this might 
negatively impact what an observer sees in their class- 
room and therefore what score they are assigned. 

This report also does not address the extent to which 
evaluators’ characteristics may be related to observation 
scores. For example, do the ratings of evaluators who are 
of the same race/ethnicity and/or gender as the teachers 
differ from the ratings assigned by evaluators who are 
different with respect to gender or race/ethnicity? Does 
the age of the evaluator relative to the age of the teacher 
have an impact on teachers’ scores? In Chicago, evalu- 
ators can only be the school principal or assistant prin- 
cipal; the principal’s prior knowledge and relationship 
with their teachers may influence their ratings.®> It may 
also be possible that principals inflate their ratings at 
the high-end of the scale as other studies have found.®& 
Furthermore, the report does not answer whether the 
reliability or validity of an evaluator’s observation 
ratings are related to school characteristics. 

We plan to address some of these questions in future 


reports. 


3. In general, students in high-poverty schools are 
more likely to be taught by teachers with lower 
observation and value-added scores. What are 
some possible ways to ensure all students have 
equal access to high-quality education? 

The district’s most disadvantaged students are more 
likely to have teachers with lower observation and 
value-added scores. However, the findings in this report 
also show there is a greater difference in teacher scores 
within schools than there is across schools. In other 
words, teachers with high observation and high value- 
added scores are found in almost all schools. Thus, in 
even the highest-poverty schools, there are teachers 
with high scores. 

In fact, on value added we see that the best teachers 
in the highest-poverty schools have the best scores in the 
district. While it is possible that teachers with the highest 
ratings are also those who teach the strongest students 


within a school, it is also likely these teachers have skills 


65 Sartain, Stoelinga, & Brown (2011); Jacobs (2010). 
66 Sartain et al. (2011). 
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and dispositions that enable them to be effective in the 
most challenging of circumstances. School and district 
leaders need to leverage the strengths of these teachers, 
ensure their talents remain in the district and enable 
other teachers to learn from them. 

Our findings also indicate that schools with stron- 
ger culture and climate have higher teacher evaluation 
scores, even in schools with similar levels of poverty. 
Better understanding of how to build strong school or- 
ganizational climate in our highest-need schools may be 
a step in creating environments where teachers can be 
successful. It is possible that districts will not be able to 
improve student-learning gains without improvements 
in schools and more understanding of the structural is- 
sues affecting high-poverty schools and disadvantaged 


students such as housing, health, and crime. 


gis 
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New Evaluation Systems Have 
Potential but There Is a Need 

for Continuous Improvement 

REACH represents a dramatic change from the previ- 
ous checklist system, which was widely regarded as 
superficial and perfunctory and provided little to no 
information about teachers.6? REACH utilizes mul- 
tiple measures, and the use of these measures presents 
advantages and disadvantages. First, these measures 
may offer some transparency into differences in teacher 
scores by school types and by teacher background char- 
acteristics. Having multiple measures allows districts 
and policymakers to compare and contrast measures 
and further diagnose issues with each measure and 
also capture different aspects of instruction. 

However, there continue to be questions about 
whether teacher evaluation measures—both value added 
and observations—accurately capture teacher quality. 
Differences in observation scores may have implications 
for the teacher workforce. 

In Chicago, as in most districts and states, observa- 
tions make up the bulk of a teacher’s overall evaluation 
score. The lower observation scores of teachers who 
teach in higher-poverty schools may add additional 


disincentives to working in higher-poverty schools or 


67 Weingartern (2010). 
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with disadvantaged students. In addition, if minority 
teachers are more likely to work in contexts where it 

is difficult to get high ratings, the composition of the 
workforce itself could be affected by the personnel 
decisions based on evaluation scores. This is especially 
true for African American teachers, who disproportion- 
ately teach in higher-poverty schools. 

To account for the influence of student character- 
istics on teacher observation scores, some researchers 
have recommended adjusting teacher observation 
scores for the demographic characteristics of their 
classrooms for high stakes accountability decisions 
such as dismissal or also gather further evidence such as 


additional observations across difference classrooms. ®® 


68 Whitehurst et al. (2014). 
69 Steinberg & Garrett (forthcoming). 
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Others suggest that observation scores should utilize 
multiple years of teacher data or should include observa- 
tions of multiple classes within the same year to adjust 
for differences in classroom composition.®? 

In our previous reports, we found teachers and 
administrators in CPS remain positive about the new 
evaluation system’s potential to drive instructional 
change. Teachers are particularly positive about the 
observation process and the opportunity for feedback, 
reflection, and communication it provides. Designing 
and implementing evaluation systems is an ongoing 
process that is important for districts and policymakers 
to continue to improve the design of these systems to 


leverage their strengths and mitigate issues. 
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Appendix A 


2013-14 REACH Scores and Ratings 


A teacher’s REACH score is comprised of a professional 
practice score and up to two measures of student growth 
(performance tasks and value added). For more details 
on REACH, visit http://www.cps.edu/reachstudents. 


Professional Practice 


Teachers are evaluated over multiple classroom obser- 
vations using the CPS Framework for Teaching, a modi- 
fied version of the Charlotte Danielson Framework for 
Teaching (see Appendix B for Framework). Formal 
observations last at least 45 minutes and include pre- 
and post-observation conferences. Currently in CPS, 
only principals and assistant principals can be certified 
evaluators. To be assigned a summative REACH evalu- 
ation rating, a teacher must be observed four times. 
Non-tenured teachers’° and tenured teachers with 
previous low ratings”' are observed four times annually 
and receive a REACH rating each year. Tenured teach- 
ers with previous high ratings are observed four times 
over the course of two years and receive a summative 
REACH rating every two years since, under I]linois law, 


tenured teachers are evaluated every two years. 


Student Growth 


To meet Illinois state law requirements about which as- 
sessments must be used for teacher evaluation, CPS has 


identified two different types of student assessments. 


Value-Added Measures 

Teachers who teach grades 3-8 reading and/or math re- 

ceive an individual value-added score based on their stu- 

dents’ NWEA MAP-~an adaptive, computer-based test. 
For high school teachers in core subjects, CPS start- 


ed using the EPAS suite of tests (EXPLORE, PLAN, and 


70 Teachers in CPS typically attain tenure in their fourth year in 
the district. 

71 Tenured teachers with previous low ratings include those 
who received an Unsatisfactory or Satisfactory rating on 


ACT) in the 2013-14 school year. However high school 
value added only counted as 5 percent of a teacher’s 
overall REACH score in 2013-14 for teachers in tested 
subjects and grades. High school value-added scores 
were not utilized in 2014-15. Thus, our analyses in this 


report do not include high school value added. 


Performance Tasks 

Developed by teams of CPS teachers, teams within indi- 
vidual schools, and/or central office staff, performance 
tasks are written or hands-on assessments designed 

to measure the mastery or progress toward mastery of 
a particular skill or standard. Performance tasks are 
typically administered and scored by teachers once 

at the beginning of the year and once at the end of the 


school year. 


REACH Scores and Ratings 


Professional practice scores are combined with student 
growth scores for an overall REACH score, which rang- 
es from 100 to 400 and translates toa REACH rating 

of Unsatisfactory, Developing, Proficient, or Excellent 
(see Table A.1). The percentages assigned to profes- 
sional practice and the measures of student growth for 


different groups of teachers are detailed in Table A.2. 


TABLE A.1 
REACH Ratings 


100 = 209 Unsatisfactory 
210 - 284 Developing 
NS = GSS) Proficient 
340 - 400 Excellent 


the previous system. Tenured teachers missing previous 
ratings were to receive four observations and a REACH 
rating in 2013-14 and then be placed on a biennial cycle in 
the following year. 
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Appendix B 


The CPS Framework for Teaching 


TABLE B.1 


The CPS Framework for Teaching 


Adapted from the Danielson Framework for Teaching and Approved by Charlotte Danielson 


Domain 1: Planning and Preparation 


a. Demonstrating Knowledge of Content and Pedagogy 
Knowledge of Content Standards Within and Across Grade Levels 
Knowledge of Disciplinary Literacy 
Knowledge of Prerequisite Relationships 
Knowledge of Content-Related Pedagogy 

b. Demonstrating Knowledge of Students 
Knowledge of Child and Adolescent Development 
Knowledge of the Learning Process 
Knowledge of Students’ Skills, Knowledge, and Language Proficiency 
Knowledge of Students’ Interests and Cultural Heritage 
Knowledge of Students’ Special Needs and Appropriate 

Accommodations/ Modifications 
c. Selecting Instructional Outcomes 
Sequence and Alignment 
Clarity 
Balance 
d. Designing Coherent Instruction 
Unit/Lesson Design that Incorporates Knowledge of Students and 
Student Needs 

Unit/Lesson Alignment of Standards-Based Objectives, Assessments, 
and Learning Tasks 

Use of a Variety of Complex Texts, Materials and Resources, including 
Technology 

Instructional Groups 

Access for Diverse Learners 

e. Designing Student Assessment 
Congruence with Standards-Based Learning Objectives 
Levels of Performance and Standards 
Design of Formative Assessments 
Use for Planning 


Domain 2: The Classroom Environment 


a. Creating an Environment of Respect and Rapport 
Teacher Interaction with Students, including both Words and Actions 
Student Interactions with One Another, including both Words and 

Actions 

b. Establishing a Culture for Learning 
importance of Learning 
Expectations for Learning and Achievement 
Student Ownership of Learning 

c. Managing Classroom Procedures 
Management of Instructional Groups 
Management of Transitions 
Management of Materials and Supplies 
Performance of Non-Instructional Duties 
Direction of Volunteers and Paraprofessionals 

d. Managing Student Behavior 
Expectations and Norms 
Monitoring of Student Behavior 
Fostering Positive Student Behavior 
Response to Student Behavior 


Domain 4: Professional Responsibilities 
a. Reflecting on Teaching and Learning 


Effectiveness 
Use in Future Teaching 
Maintaining Accurate Records 
Student Completion of Assignments 
Student Progress in Learning 
Non-Instructional Records 
c. Communicating with Families 
Information and Updates about Grade Level Expectations and Student 
Progress 
Engagement of Families and Guardians as Partners in the Instructional 
Program 
Response to Families 
Cultural Appropriateness 
d. Growing and Developing Professionally 
Enhancement of Content Knowledge and Pedagogical Skill 
Collaboration and Professional Inquiry to Advance Student Learning 
Participation in School Leadership Team and/or Teacher Teams 
Incorporation of Feedback 
e. Demonstrating Professionalism 
Integrity and Ethical Conduct 
Commitment to College and Career Readiness 
Advocacy 
Decision-Making 


b 


Compliance with School and District Regulations 


Domain 3: Instruction 


a. Communicating with Students 
Standards-Based Learning Objectives 
Directions for Activities 
Content Delivery and Clarity 
Use of Oral and Written Language 
b. Using Questioning and Discussion Techniques 
Use of Low- and High-Level Questioning 
Discussion Techniques 
Student Participation and Explanation of Thinking 
c. Engaging Students in Learning 
Standards-Based Objectives and Task Complexity 
Access to Suitable and Engaging Texts 
Structure, Pacing and Grouping 
d. Using Assessment in Instruction 
Assessment Performance Levels 
Monitoring of Student Learning with Checks for Understanding 
Student Self-Assessment and Monitoring of Progress 
e. Demonstrating Flexibility and Responsiveness 
Lesson Adjustment 
Response to Student Needs 
Persistence 
Intervention and Enrichment 
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Appendix C 


Methods for Analyzing Relationships between REACH 
Scores and School Characteristics 


We find, descriptively, observation scores have a negative 
relationship with our school-level measures of student 
poverty, percent minority, and previous achievement. As 
the concentration of poverty within a school increases, 
teacher observation scores decrease on average. 

There is a weaker relationship between school-level 
measures of student poverty and teacher value-added 
scores, as value added explicitly controls for student 
poverty and other characteristics. 

Observation data analyzed in this report are from 
2013-14 and are described on page 8. Only teachers with 
at least two observations were included in our analysis 
of observation scores. Value-added data analyzed in 
this report are from 2012-13 and 2013-14, as we utilized 
multi-year averages for teachers when available. We uti- 
lized a two-level hierarchical linear model, controlling 
for teacher characteristics at level 1 and school char- 
acteristics at level 2. Our outcomes were professional 
practice scores (the weighted average of a teacher’s 
observation component ratings) and individual value- 


added scores. 


TABLE C.1 


Level 1 Model: 
Y= B at B,Tenure + B,AdvancedDegree + B,NBCT,, 


sl 
y 


Level 2 Model: 


Boj =a Yo,LogEnrollment, + Vouk; +Ug; 


where: 

Yis one of three outcomes (observation score and math 
and reading value-added scores) for teacher 7 in school 

j.X are our school-level characteristics of interest and 
each one is included in our model in separate regressions. 
The coefficient of interest is Teas which is the association 
between our outcomes (observation and value-added 
scores) and school-level characteristics (concentration 
of poverty, previous achievement, percent minority). 
Table C.1 presents coefficients from models predict- 

ing teachers’ observation scores and Table C.2 presents 
coefficients from models predicting teachers’ value- 
added scores. School-level percentage African American 
students is an indicator for schools with greater than 


70 percent African American students. 


The Association between Observation Scores and School Characteristics 


Observation Score Elementary High School 


Model 1 Model 2 Model 3 


School-Level 
Concentration of 
Student Poverty 


-14.688*** 


School-Level 
Percentage FRL 


-12.482*** 


School-Level 
Percentage African 
American Students 


HO) 


School-Level 
Previous 
Achievement 


Model 4 Model 1 Model 2 Model 3 Model 4 


“AIOE 


-9.584*** 


24.553*** 


(AnIOhs IZ O9GEs 


Note: The coefficients shown in this table are from regressions of observation scores with school characteristics controlling for teacher characteristics (tenure 
status, advanced degree, National Board Certification) and school size. Asterisks denote statistical significance: *** at the 0.01 level, ** at the 0.05 level, and * at 
the 0.10 level. All predictors are z-standardized except the school-level percentage of African American students. 
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We utilized two measures of economic status of stu- 
dents in schools. The first is the percentage of students 
eligible for free/reduced-price lunch in a school. The 
second, the concentration of student poverty, is derived 
from census data on students’ residential neighborhoods, 
measured at the census block group level (but averaged 
at the school level) and captures the percent of unem- 
ployed males over 16 and the percent of families with in- 
comes below the poverty line. Table C.3 shows summary 
statistics of these two measures of school-level poverty. 

Previous school-average achievement is calculated dif- 


ferently depending on the schools’ grade configuration. 


TABLE C.2 


Elementary Schools: We rana three-level hierarchical 
linear model (HLM) using ISAT test scores from 2012- 
13 as the outcome. We included a measurement model 
at level 1, individuals at level 2, and schools at level 3.72 
In level 2, we included dummy variables indicating the 
student’s grade, with grade 6 the omitted grade. Thus, 
the school-level empirical Bayes estimates represent 
the predicted average achievement for the students in 
grade 6, adjusted for the grade structure of the school, 


and the amount of measurement error in the test scores. 


The Association between Value-Added Scores and School Characteristics 


Reading Value Added Elementary | Model1 | Model 2 Model 3 Model 4 


School-Level Concentration of Student Poverty 
School-Level Percentage FRPL 

School-Level Percentage African American Students 
School-Level Previous Achievement 

School-Level Concentration of Student Poverty 
School-Level Percentage FRL 

School-Level Percentage African American Students 


School-Level Previous Achievement 


=2. SO 
=|, 751 
0.1470 
5.438*** 


| Model1 | Model 2 Model 3 Model 4 


1.807 
0.667 
3.241 
2.644** 


Note: The coefficients shown in this table are from regressions of value-added scores with school characteristics controlling for teacher characteristics (tenure 
status, advanced degree, National Board certification) and school size. Asterisks denote statistical significance: *** at the 0.01 level, ** at the 0.05 level, and * at 
the 0.10 level. All predictors, except the school-level percentage of African American students, are z-standardized. 


TABLE C.3 
Measures of School-Level Poverty 


umber of 
Schools 


Percent Free/Reduced- ELEM 0.835 0.228 0.101 

Be EUneh HS | 0.857 oe 0.154 a 0.308 0.995 | 101 

SCON ELEM 0.237 0.569 -1.097 2.20 425 
HS 0.331 0.404 -0.616 1.074 101 


72 Raudenbush & Bryk (2002). 
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High Schools: We used the average incoming student 
achievement level of ninth-graders in the 2012-13 
freshman cohort for each school. This was calculated 
using a longitudinal three-level HLM with test scores 
from 2006-07 to 2011-12 as the outcome. Each student’s 
test score trajectory was calculated separately for math 
and reading, including the student’s year in school, and 


the year squared, centered on grade 5. The student’s 


TABLE C.4 


eighth-grade test score as predicted by the model is the 
incoming achievement. It takes into consideration the 
student’s entire test score history and levels the effects 
of random measurement error in individual test scores. 
Correlations between our measures of school-level 
poverty and observation scores are similar for each of 


the four domains of the CPS Framework for teaching. 


REACH Scores Correlation with Measures of School-Level Concentration of Poverty 


Correlation with Measures of Concentration of Economically Disadvantaged Students 


School-Level Concentration of School-Level 

Student Poverty Percentage FRL 
Domain 1 = OZ -.2160 
Domain 2 -.3082 -.2068 
Domain 3 O223 =, ASS) 
Domain 4 =, 27/7 2 BSS) 
Observation Score -.3346 -.2374 
Reading Value-Added Score -.0471 -.0468 
Math Value-Added Score .0450 .0161 
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Math and Reading Value-Added Scores and High School 
Observation Scores by School-Level Poverty 


Figures D.1, D.2, and D.3 display the distribution of 
teachers with the highest scores (top 20 percent) and 
lowest scores (bottom 20 percent) on reading value added 
and math value added for elementary schools and high 
school observation scores across differing levels of school 
poverty (quintiles of school-level poverty). In these fig- 
ures, school-level poverty is measured by the percentage 
of students eligible for free/reduced-price lunch. 

For example, 18 percent of teachers with top scores on 
reading value added are in schools with lowest poverty 


and 18 percent of teachers with top scores on reading 


FIGURE D.1 


value added are in highest-poverty schools. Teachers 
with bottom scores on reading value-added (bottom 20 
percent) are overrepresented in highest-poverty schools. 
Twenty-six percent of teachers with lowest scores on 
reading value added are in highest-poverty schools 

while only 13 percent are in lowest-poverty schools. 

In high schools 45 percent of teachers with top 
scores on observations are in schools with lowest 
poverty and only 4 percent are in schools with highest 
poverty. Teachers with lowest scores on observations 


are slightly overrepresented in highest-poverty schools. 


Teachers with Lowest Reading Value-Added Scores are Over Represented in Highest Poverty Schools 


Distribution of Teachers with the Highest and Lowest Reading Value-Added Scores by School Poverty Level 


Teachers with Highest Scores 


24% 


18% 


Percent of Teachers 
N 
fe) 
| 


Teachers with Lowest Scores 


25% _26% 


18% 18% 


13% 


15 4 
104 
i 
0 
Lowest — Highest 


Lowest — Highest 


Quintiles of School Poverty Level 


Note: Elementary school only. 2,609 lowest-rated teachers and 2,594 highest-rated teachers. 
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FIGURE D.2 
Teachers with Highest and Lowest Math Value-Added Score are More Evenly Distributed Among Schools 


Distribution of Teachers with the Highest and Lowest Math Value-Added Scores by School Poverty Level 


40 5 
Teachers with Highest Scores Teachers with Lowest Scores 
35 4 
30 + 
9 
25 4 22% 22% 22% 23% 
20% 20% 20% 20% 


16% 15% 


Percent of Teachers 
N 
fe) 
| 


154 
10 4 
5-4 
(3) 
Lowest —_ Highest Lowest ss Highest 


Quintiles of School Poverty Level 


Note: Elementary school only. 2,609 lowest-rated teachers and 2,594 highest-rated teachers. 


FIGURE D.3 
High School Teachers with Highest Observation Scores are Under Represented in Highest-Poverty Schools 


Distribution of High School Teachers with the Highest and Lowest Observation Scores by School Poverty Level 
50 7 
45% 

Teachers with Highest Scores Teachers with Lowest Scores 
40 4 


30 5 


25% 25% 


20% 19% 


20 18% 


18% 


Percent of Teachers 


cc TT 0 
Lowest — Highest Lowest — Highest 


Quintiles of School Poverty Level 


Note: High school only. 985 lowest-rated teachers and 996 highest rated teachers. 
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Percentages of First-Year Teachers and National Board 
Certified Teachers by School Poverty Level 


Figures E.1 and E.2 display the distribution of new teach- reduced-price lunch. Figure E.1 shows that highest-poverty 
ers and teachers with National Board Certification across elementary and high schools tend to have higher percent- 
schools with differing levels of school poverty (quintiles of ages of first-year teachers. Figure E.2 shows that highest- 
school-level poverty). In these figures, school-level poverty poverty elementary and high schools tend to have lower 


is measured by the percentage of students eligible for free/ | percentages of teachers with National Board Certification. 


FIGURE E.1 FIGURE E.2 


Percentages of First-YearTeachers in Each Percentages of NBCTs in Each School-Poverty Quintile 
School-Poverty Quintile 


207 90: 
17% 
16% 39 
‘ 15 4 14% 5 15 4 14% 
< 13% 13% 13% x 
3 2 
2 1% 1% 1% Fi 
= 10 7 - 
= 8% i 
c c 
o oO 
S) ) 
x x 
o i) 
a 54 a 
0 T 
1 2 KA 4 5 
EEE 
Lowest _—_ Highest Lowest __ Highest 
School Poverty Level Quintiles School Poverty Level Quintiles 
Elementary i High School Elementary i High School 


Note: Elementary schools 1700 new teachers. High schools, 606 new teachers. Note: Elementary schools: 973 NBCT teachers. High schools: 488 NBCT teachers. 
New teachers defined as teachers with <1.5 years of experience in the district. 
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Change to Models Utilized for Analysis of the Relationships 
between REACH Scores and Measures of School Climate 


In Chapter 1, we examined relationships between the 
climate and culture of schools and the observation and 
value-added scores of teachers in those schools utilizing 
student and teacher perceptions of climate and culture 
from the district wide My Voice, My School surveys from 
the spring of 2014. 

In schools with stronger professional climate—as 
perceived by teachers—and stronger learning climate 
and instruction—as perceived by students—teachers 
tend to have higher scores on both value-added and 
observation scores. To analyze these relationships, 
we utilized a two-level hierarchical linear model, con- 
trolling for teacher characteristics at level 1 and school 
characteristics such as poverty, percentage minority, 
and previous achievement at level 2. Our outcomes 
were professional practice scores (the weighted aver- 
age of a teacher’s observation score) and individual 
value-added scores. We included each survey measure 
of school climate in separate regressions to avoid co- 


linearity issues. 
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Level 1 Model: 


Y..=Q..+20K..+r.. 
ij Oj | 


Level 2 Model: 


Oy = hoo + dR, +2 AN + Uo; 


where: 

Yis one of three outcomes (observation score, math 
value-added score, and reading value-added score) for 
teacher 7in schoolj. K is a vector of teacher charac- 
teristics (years of experience, gender, race/ethnicity, 
advanced degree) for teacher zin school j. Nis a vector 
of school characteristics (e.g., school-level poverty, 
previous achievement) for school. 

Rrepresents our survey measures of school climate; 
each one is included in our model as separate regres- 
sions. The coefficient of interest is Divs which is the 
association between our outcomes (observation and 
value-added scores) and school-level measures of cli- 


mate. These coefficients are presented in Tables F.1) 


TABLE F.1 


REACH Scores and Survey Measures of School Climate 


Survey Measures 

Academic Personalism (Student) 

Academic Press (Student) 

Collective Responsibility (Teacher) 
Outreach to Parents (Teacher) 

Peer Support for Academic Work (Student) 
Principal Instructional Leadership (Teacher) 
Program Coherence (Teacher) 

Quality of Student Discussion (Teacher) 
Quality Professional Development (Teacher) 
Safety (Student) 

School Commitment (Teacher) 

Socialization of New Teachers (Teacher) 
Student Responsibility (Teacher) 
Student-Teacher Trust (Student) 

Teacher Influence (Teacher) 

Teacher Safety (Teacher) 

Teacher-Parent Trust (Teacher) 
Teacher-Principal Trust (Teacher) 


Teacher-Teacher Trust (Teacher) 


Observations Reading 
Value Added 


Coefficient 
LSS 
1.842 
4.080*** 
Si eleisy 
AMeil? 
SrOl2e 
4.723*** 
S:555m 
S.Sgey"* 
4.791*** 
8.144*** 
SS ons 
G.Ove 
POS 
7.284*** 

Soro O.ouae 
Sissy 
Gr67 Sea 
1.744 


Coefficient 
3.714*** 
4.314*** 
4.419*** 
4.937*** 
4.544*** 
4.74*** 
SOS 
47599"** 
4.495*** 
Gate 
4.867*** 
Sean 
5.56 /i 
SOS /aas 
GOSS" 
“igs 
4.163*** 
AAS 
1.412 


Note: Teacher Safety is a negatively valenced measure where higher values indicate teachers feeling less safe. 
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Math 
Value Added 


Coefficient 
Sro6llip = 
4.162*** 
SSI 
SrA27 
4.097*** 
Arama 
BEVIS) 
BAe 
ATK 
1.843 
Sul 
Ze SiO me 
S35 
Qe Sle 
Asi 

6), AIS 
2.234* 
252008 
1.190 


Al 
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Appendix G 


Models Utilized for Analysis of Teacher Background 


Characteristics and Evaluation 


In Chapter 2, we examined relationships between 
observation scores, value-added scores, and teacher 
background characteristics (years of experience, 
advanced degree, National Board Certification, 
teacher race/ethnicity, and teacher gender). 

To analyze these relationships, we utilized a school 
fixed-effect model comparing teachers within the same 


schools across all schools. 
Yy= BX y* +e, 
where: 


TABLE G.1 
Years of Experience 


Scores 


Yis one of two outcomes (observation score and 
value-added score) for teacher 7 in school). 

X represents our variables of interest (years of 
experience, gender, race/ethnicity, advanced degree) 
for teacher 7 in school/. The associations for each pre- 
dictor are estimated in separate models. All predictors 
are categorical; for those with more than two categories 
we entered a series of dummy variables in the model. 

B, is the coefficient fora particular teacher 
background characteristic (listed above). 


a, are fixed-effects for each school j. 


Experience in the District Observation Reading Math 
Score Value-Added Score Value-Added Score 
1 Year PMT 244.1 25059) 
2-5 Years B74 249.8 252 
6-10 Years 317.1 250.2 252 
10-15 years 313.6 251.9 25220) 
>15 Years 307.4 249.0 247.4 


Note: Unadjusted average scores are raw averages not adjusting for other teacher characteristics or school effects. Bolded indicates a statistical significant 


difference from first-year teachers (p>0.05). 
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TABLE G.2 
Advanced Degrees and National Board 


Non-NBCTs 
NBCTs 
BA Only 


Certification 


Unadjusted Average Scores 


Observation 
Score 


307.3 
336.4 
305.0 


Advanced Degree 


313.2 


Reading 
Value-Added Score 


248.7 
2535: 
247.7 
250.1 


Math 
Value-Added Score 


250.0 
BSI) 
250.1 
251.0 


Note: Unadjusted average scores are raw averages not adjusting for other teacher characteristics or school effects. Bolded indicates a statistical significance 
difference with excluded group (non-NBCTs or BA only) at p>0.05. 


TABLE G.3 


Average Scores by Teacher Race/Ethnicity and Gender 


Male 


Observation 
Score 


301.8 


Unadjusted Average Scores 


Reading 
Value-Added Score 


245.0 


43 


Math 
Value-Added Score 


248.9 


Female 


White 
African American 
Latino 
Other 


312.9 
319.0 
289.3 
Sille7/ 
309.7 


250.2 
248.9 
249.6 
244.3 
25277, 


251.1 
DNS 
249.9 
248.9 
254.6 


Note: Unadjusted average scores are raw averages not adjusting for other teacher characteristics or school effects. Bolded indicates a statistical significant 


difference with excluded group (male for gender, white for race/ethnicity) at p>0.05. 
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